Conversation
✅ Deploy Preview for oasisprotocol-oasis-core canceled.
|
e7dc2b2 to
230747b
Compare
martintomazic
left a comment
There was a problem hiding this comment.
I would like to reach consensus on 1...3 from the context before proceeding.
Personally, I think only e2e test would be simplest and make me most confident. There would be some boilerplate though + more resources on the CI, although it will be removed eventually.
Otherwise totally in for making our code testable via mocks etc...
| // Test diff sync protocol. | ||
| diffClient := diffsync.NewClient(host, chainContext, runtimeID) | ||
| time.Sleep(1 * time.Second) |
There was a problem hiding this comment.
This is already flaky in the CI. Ideally there should be explicit synchronization or increase the timeout.
Let's see if we opt for e2e only with config.
| func init() { | ||
| peermgmt.RegisterNodeHandler(&peermgmt.NodeHandlerBundle{ | ||
| ProtocolsFn: func(n *node.Node, chainContext string) []core.ProtocolID { | ||
| if !n.HasRoles(node.RoleComputeWorker | node.RoleStorageRPC) { |
There was a problem hiding this comment.
Probably we should add observer nodes as well (also elsewhere).
| func (bm *backendMock) GetCheckpoints(_ context.Context, request *checkpoint.GetCheckpointsRequest) ([]*checkpoint.Metadata, error) { | ||
| cpVersion = request.Version | ||
| root = node.Root{ | ||
| // Existing bug: storagesync p2p server does not set namespace, nor does checkpoint.ChunkProvider validates it. |
There was a problem hiding this comment.
Not sure if this should be considered a bug, because the protocol is always per runtime so the namespace is implicit.
There was a problem hiding this comment.
You are right, I will simply ignore this like other part of the code do.
Possibly I could add a comment to the api request definition that this filed is redundant/not used.
There was a problem hiding this comment.
If we use cbor codec for grpc, omitting unused field should be both forward and backward compatible? So possibly we could remove it altogether (separate PR of course). Else I will just leave a comment.
There was a problem hiding this comment.
What do you mean remove it? The same root structure is used in places where it needs the namespace.
There was a problem hiding this comment.
I would just leave a comment for now.
There was a problem hiding this comment.
What do you mean remove it? The same root structure is used in places where it needs the namespace.
I meant removing the namespace from request (backend) not from root:
Update: or update all callsites to pass it, add new field to storage/checkpoint sync request and validate in the backend (omitting validation in case of zero value not to be breaking).
Will go with comment for this PR as this would be out of scope. :)
Note that as I see it currently, you always register for the checkpoint sync protocol, even if the checkpointer is disabled? In this case, this alone will not solve the above issue. |
Yeah that could become complicated fast and the e2e suite likely needs a major refresh so I would do it separately.
I think copying is the safer option. |
Nice catch! I guess correct solution to it is the complementary solution from the #5750 (comment)
^^ or independent and then only register if checkpointer is enabled. update: This has to be independent method. E.g. registering light client protocol server should not start advertisement as peers are found here by other means. |
a9bd6b5 to
462a7eb
Compare
|
Let's get #6270 before this one to also tackle #6262 (comment) here. Will handle remaining threads later on. |
|
|
||
| // NewClient creates a new RPC client for the given protocol. | ||
| func NewClient(h host.Host, p protocol.ID) Client { | ||
| func NewClient(h host.Host, p protocol.ID, fallback ...protocol.ID) Client { |
There was a problem hiding this comment.
I think you need to accept pids []protocol.ID because go-libp2p implementation doesn't guarantee that the first ID will be used if possible.
There was a problem hiding this comment.
Isn't this just syntactic sugar for slice? The main problem as you point out is that interface does not promise order.
In practice, implementation favors protocols from left to right.
Ideally, we should not rely on this. If we had many methods as part of our protocol likely this would be the only sensible approach. But we don't, so maybe we can make it simpler.
| } | ||
| } | ||
|
|
||
| func (s *service) handleGetDiff(ctx context.Context, request *GetDiffRequest) (*GetDiffResponse, error) { |
There was a problem hiding this comment.
| func (s *service) handleGetDiff(ctx context.Context, request *GetDiffRequest) (*GetDiffResponse, error) { | |
| func (s *service) handleGetDiff(ctx context.Context, req *GetDiffRequest) (*GetDiffResponse, error) { |
| } | ||
|
|
||
| // NewServer creates a new storage diff protocol server. | ||
| func NewServer(chainContext string, runtimeID common.Namespace, backend api.Backend) rpc.Server { |
There was a problem hiding this comment.
To avoid returning interface, I would create function NewService and let the caller create rpc.Server. Or create 2 functions.
| ) | ||
|
|
||
| // Client is a checkpoints sync protocol client. | ||
| type Client interface { |
There was a problem hiding this comment.
Do we need this interface?
| // Warning: This client only registers the checkpoint sync protocol with the P2P | ||
| // service. To enable advertisement of the legacy protocol, it must be registered | ||
| // separately. | ||
| func NewClient(p2p rpc.P2P, chainContext string, runtimeID common.Namespace) Client { |
There was a problem hiding this comment.
Move constructor after struct type definition.
| // separately. | ||
| func NewClient(p2p rpc.P2P, chainContext string, runtimeID common.Namespace) Client { | ||
| pid := protocol.NewRuntimeProtocolID(chainContext, runtimeID, CheckpointSyncProtocolID, CheckpointSyncProtocolVersion) | ||
| fallbackPid := sync.GetStorageSyncProtocolID(chainContext, runtimeID) |
There was a problem hiding this comment.
I would also prepare PR which will clean these things.
| } | ||
|
|
||
| func (c *client) getBestPeers(opts ...rpc.BestPeersOption) []core.PeerID { | ||
| return append(c.mgr.GetBestPeers(opts...), c.fallbackMgr.GetBestPeers(opts...)...) |
There was a problem hiding this comment.
I'm not sure if this will work as expected. Maybe we should have just 2 clients, everyone with its own protocol ID and then we can switch to the legacy one in methods like GetCheckpointChunk if we don't get an answer from the first one.
There was a problem hiding this comment.
Why? I think the test confirms it works, but indeed this is probably simpler and thus better, given that we only have 3 methods to fallback manually. Will think a bit after rebasing this PR on top of #6270.
In general this would also enable proper e2e testing (we could remove existing integration test). E2e test can be registered to not execute by default and used later as sanity again when removing support for both protocols.
Probably to make this work we should refactor existing storage and checkpoint sync to avoid returning custom Checkpoint type that wraps metadata and slice of peer feedbacks, but instead return slice of metadata with slice of slices of peer feedback. See. Or handle this type discrepancy on the call site. Uglier I would want but doable
| Checkpoints []*checkpoint.Metadata `json:"checkpoints,omitempty"` | ||
| } | ||
|
|
||
| // Constants related to the GetCheckpointChunk method. |
There was a problem hiding this comment.
Every constant should have its own comment.
462a7eb to
32a6ad1
Compare
|
Will consider simplifying with 2 clients again and then I tag you all for a final review :). |
32a6ad1 to
ebbbcb1
Compare
Avoid global state to enable writting unit tests with multiple peers in the same process.
Legacy storage sync protocol is still advertised and served to enable seamless rolling upgrades of the network.
Maybe I will replace this with e2e alltogether. To be consistent I should probably extend other p2p clients. On the other hand I dislike methods for testing only.
ebbbcb1 to
96be6bd
Compare
Closes #5751, blocked by #6270.
Likely solves root cause of #5750.
Open for discussion: